1. Data

Load the student scores for the test:

## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 3988 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 5501 rows [1, 2, 3,
## 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].

Show data summary

skim(test_scores)
Data summary
Name test_scores
Number of rows 8993
Number of columns 28
_______________________
Column type frequency:
character 4
numeric 24
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
test_version 0 1 3 4 0 2 0
year 0 1 7 7 0 9 0
class 8993 0 NA NA 0 0 0
AnonID 0 1 9 9 0 8899 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
A1_B1 39 1.00 4.38 1.48 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A2_B0 5500 0.39 4.23 1.73 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A3_B2 211 0.98 3.62 2.24 0 0.00 5.0 5.00 5 <U+2583><U+2581><U+2581><U+2581><U+2587>
A4_B3 132 0.99 4.00 1.43 0 3.00 5.0 5.00 5 <U+2581><U+2581><U+2582><U+2582><U+2587>
A5_B4 375 0.96 4.20 1.83 0 5.00 5.0 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A6_B5 1377 0.85 1.99 1.89 0 0.00 2.0 2.00 5 <U+2587><U+2587><U+2581><U+2581><U+2585>
A7_B6 716 0.92 4.00 1.93 0 5.00 5.0 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A8_B0 5500 0.39 3.03 2.44 0 0.00 5.0 5.00 5 <U+2585><U+2581><U+2581><U+2581><U+2587>
A9_B8 199 0.98 4.37 1.66 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A10_B9 244 0.97 3.62 1.90 0 2.50 5.0 5.00 5 <U+2582><U+2581><U+2583><U+2581><U+2587>
A11_B0 5500 0.39 3.89 1.67 0 2.50 5.0 5.00 5 <U+2581><U+2581><U+2582><U+2581><U+2587>
A12_B12 371 0.96 4.20 1.74 0 5.00 5.0 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A13_B13 304 0.97 3.85 1.85 0 2.00 5.0 5.00 5 <U+2582><U+2582><U+2581><U+2581><U+2587>
A14_B14 576 0.94 2.97 1.91 0 2.00 2.0 5.00 5 <U+2583><U+2586><U+2581><U+2581><U+2587>
A15_B15 352 0.96 3.60 2.01 0 2.50 5.0 5.00 5 <U+2582><U+2581><U+2582><U+2581><U+2587>
A16_B16 270 0.97 4.49 1.51 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A17_B17 200 0.98 4.62 1.33 0 5.00 5.0 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A18_B18 367 0.96 3.77 2.15 0 5.00 5.0 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A19_B19 1044 0.88 3.17 2.41 0 0.00 5.0 5.00 5 <U+2585><U+2581><U+2581><U+2581><U+2587>
A20_B20 920 0.90 2.37 2.50 0 0.00 0.0 5.00 5 <U+2587><U+2581><U+2581><U+2581><U+2587>
Total 0 1.00 69.65 21.55 0 58.50 74.0 85.50 100 <U+2581><U+2581><U+2583><U+2587><U+2587>
A0_B7 4149 0.54 2.26 2.49 0 0.00 0.0 5.00 5 <U+2587><U+2581><U+2581><U+2581><U+2586>
A0_B10 4515 0.50 2.77 1.73 0 1.25 2.5 4.38 5 <U+2583><U+2587><U+2582><U+2585><U+2587>
A0_B11 4008 0.55 4.18 1.41 0 3.00 5.0 5.00 5 <U+2581><U+2582><U+2581><U+2581><U+2587>

There are some NAs in the data. These are exclusively in the results from the post version, where Moodle recorded a null response as NA (i.e. cases where students did not give an answer that could be graded).

For this analysis, we replace the NA scores with 0, reflecting a non-correct answer.

test_scores = bind_rows(
  "pre" = test_scores_pre %>%
    replace_names(old_names = test_versions$pre, new_names = test_versions$label),
  "post" = test_scores_post %>%
    mutate(across(matches("^B\\d"), ~ replace_na(., 0))) %>% 
    replace_names(old_names = test_versions$post, new_names = test_versions$label),
  .id = "test_version"
) %>%
  # we also don't want the Total column, as it skews the shading of the data summary tables below
  select(-Total)

Show data summary

skim(test_scores)
Data summary
Name test_scores
Number of rows 8993
Number of columns 27
_______________________
Column type frequency:
character 4
numeric 23
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
test_version 0 1 3 4 0 2 0
year 0 1 7 7 0 9 0
class 8993 0 NA NA 0 0 0
AnonID 0 1 9 9 0 8899 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
A1_B1 0 1.00 4.36 1.50 0 5.00 5.00 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A2_B0 5500 0.39 4.23 1.73 0 5.00 5.00 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A3_B2 0 1.00 3.53 2.28 0 0.00 5.00 5.00 5 <U+2583><U+2581><U+2581><U+2581><U+2587>
A4_B3 0 1.00 3.94 1.50 0 3.00 5.00 5.00 5 <U+2581><U+2581><U+2582><U+2582><U+2587>
A5_B4 0 1.00 4.03 1.98 0 5.00 5.00 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A6_B5 0 1.00 1.69 1.89 0 0.00 2.00 2.00 5 <U+2587><U+2586><U+2581><U+2581><U+2583>
A7_B6 0 1.00 3.68 2.14 0 2.50 5.00 5.00 5 <U+2583><U+2581><U+2581><U+2581><U+2587>
A8_B0 5500 0.39 3.03 2.44 0 0.00 5.00 5.00 5 <U+2585><U+2581><U+2581><U+2581><U+2587>
A9_B8 0 1.00 4.27 1.77 0 5.00 5.00 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A10_B9 0 1.00 3.52 1.97 0 2.50 5.00 5.00 5 <U+2582><U+2581><U+2583><U+2581><U+2587>
A11_B0 5500 0.39 3.89 1.67 0 2.50 5.00 5.00 5 <U+2581><U+2581><U+2582><U+2581><U+2587>
A12_B12 0 1.00 4.02 1.90 0 5.00 5.00 5.00 5 <U+2582><U+2581><U+2581><U+2581><U+2587>
A13_B13 0 1.00 3.72 1.95 0 2.00 5.00 5.00 5 <U+2582><U+2582><U+2581><U+2581><U+2587>
A14_B14 0 1.00 2.78 1.99 0 1.00 2.00 5.00 5 <U+2585><U+2586><U+2581><U+2581><U+2587>
A15_B15 0 1.00 3.46 2.09 0 1.25 5.00 5.00 5 <U+2583><U+2581><U+2582><U+2581><U+2587>
A16_B16 0 1.00 4.36 1.67 0 5.00 5.00 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A17_B17 0 1.00 4.51 1.48 0 5.00 5.00 5.00 5 <U+2581><U+2581><U+2581><U+2581><U+2587>
A18_B18 0 1.00 3.62 2.24 0 0.00 5.00 5.00 5 <U+2583><U+2581><U+2581><U+2581><U+2587>
A19_B19 0 1.00 2.81 2.48 0 0.00 5.00 5.00 5 <U+2586><U+2581><U+2581><U+2581><U+2587>
A20_B20 0 1.00 2.13 2.47 0 0.00 0.00 5.00 5 <U+2587><U+2581><U+2581><U+2581><U+2586>
A0_B7 3493 0.61 1.99 2.45 0 0.00 0.00 5.00 5 <U+2587><U+2581><U+2581><U+2581><U+2585>
A0_B10 3493 0.61 2.25 1.89 0 0.00 1.88 4.38 5 <U+2587><U+2586><U+2582><U+2583><U+2587>
A0_B11 3493 0.61 3.78 1.81 0 2.00 5.00 5.00 5 <U+2582><U+2582><U+2581><U+2581><U+2587>

Data summary

The number of responses from each cohort:

test_scores %>% 
  group_by(year) %>% 
  tally() %>% 
  gt() %>% 
  data_color(
    columns = c("n"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  )
year n
2013-14 880
2014-15 1011
2015-16 777
2016-17 825
2017-18 926
2018-19 1159
2019-20 1237
2020-21 1182
2021-22 996

There is the same (roughly normal) distribution of raw scores each year:

total_scores <- test_scores %>% mutate(total = rowSums(across(where(is.numeric)), na.rm = TRUE))

total_scores %>%
  ggplot(aes(x = total)) +
  geom_histogram(binwidth = 2) +
  facet_wrap(vars(year)) +
  theme_minimal()

Mean and standard deviation for each item (note this table is very wide - see the scroll bar at the bottom!):

test_scores %>% 
  select(-class, -test_version) %>% 
  group_by(year) %>% 
  skim_without_charts() %>% 
  select(-contains("character."), -contains("numeric.p"), -skim_type) %>% 
  rename(complete = complete_rate) %>% 
  # make the table wider, i.e. with separate columns for each year's results, with the year at the start of each column name
  pivot_wider(names_from = year, values_from = -c(skim_variable, year), names_glue = "{year}__{.value}") %>% 
  # put the columns in order by year
  select(sort(names(.))) %>% 
  select(skim_variable, everything()) %>% 
  # use GT to make the table look nice
  gt(rowname_col = "skim_variable") %>% 
  # group the columns from each year
  tab_spanner_delim(delim = "__") %>%
  fmt_number(columns = contains("numeric"), decimals = 2) %>%
  fmt_percent(columns = contains("complete"), decimals = 0) %>% 
  # change all the numeric.mean and numeric.sd column names to Mean and SD
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.mean"), label = "Mean") %>% deframe()
  ) %>% 
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.sd"), label = "SD") %>% deframe()
  ) %>%
  data_color(
    columns = contains("numeric.mean"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
2013-14 2014-15 2015-16 2016-17 2017-18 2018-19 2019-20 2020-21 2021-22
complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD
AnonID 100% 0 NA NA 100% 0 NA NA 100% 0 NA NA 100% 0 NA NA 100% 0 NA NA 100% 0 NA NA 100% 0 NA NA 100% 0 NA NA 100% 0 NA NA
A1_B1 100% 0 4.20 1.55 100% 0 4.23 1.53 100% 0 4.30 1.50 100% 0 4.30 1.47 100% 0 4.43 1.50 100% 0 4.37 1.54 100% 0 4.49 1.42 100% 0 4.48 1.42 100% 0 4.36 1.59
A2_B0 100% 0 4.18 1.78 100% 0 4.15 1.79 100% 0 4.33 1.65 100% 0 4.28 1.69 0% 926 NaN NA 0% 1159 NaN NA 0% 1237 NaN NA 0% 1182 NaN NA 0% 996 NaN NA
A3_B2 100% 0 3.20 2.40 100% 0 3.03 2.44 100% 0 3.22 2.39 100% 0 3.38 2.34 100% 0 3.57 2.26 100% 0 3.64 2.23 100% 0 3.85 2.10 100% 0 3.79 2.14 100% 0 3.83 2.12
A4_B3 100% 0 3.55 1.56 100% 0 3.59 1.57 100% 0 3.72 1.56 100% 0 3.87 1.52 100% 0 4.03 1.47 100% 0 4.06 1.43 100% 0 4.07 1.47 100% 0 4.17 1.41 100% 0 4.21 1.40
A5_B4 100% 0 3.85 2.10 100% 0 3.92 2.06 100% 0 4.05 1.96 100% 0 4.07 1.95 100% 0 3.93 2.05 100% 0 4.03 1.98 100% 0 4.07 1.95 100% 0 4.20 1.83 100% 0 4.05 1.96
A6_B5 100% 0 1.37 1.82 100% 0 1.40 1.81 100% 0 1.70 1.95 100% 0 1.88 1.97 100% 0 1.73 1.82 100% 0 1.65 1.83 100% 0 1.75 1.86 100% 0 1.82 1.90 100% 0 1.85 1.96
A7_B6 100% 0 3.36 2.27 100% 0 3.22 2.32 100% 0 3.46 2.24 100% 0 3.84 2.07 100% 0 3.77 2.09 100% 0 3.80 2.06 100% 0 3.90 2.02 100% 0 3.90 2.01 100% 0 3.69 2.14
A8_B0 100% 0 2.80 2.48 100% 0 2.82 2.48 100% 0 3.13 2.42 100% 0 3.43 2.32 0% 926 NaN NA 0% 1159 NaN NA 0% 1237 NaN NA 0% 1182 NaN NA 0% 996 NaN NA
A9_B8 100% 0 4.05 1.96 100% 0 4.13 1.89 100% 0 4.16 1.87 100% 0 4.32 1.72 100% 0 4.20 1.83 100% 0 4.37 1.66 100% 0 4.37 1.66 100% 0 4.40 1.63 100% 0 4.32 1.71
A10_B9 100% 0 3.01 2.10 100% 0 3.06 2.09 100% 0 3.14 2.05 100% 0 3.35 2.04 100% 0 3.62 1.87 100% 0 3.63 1.90 100% 0 3.73 1.89 100% 0 3.90 1.79 100% 0 3.90 1.79
A11_B0 100% 0 3.84 1.66 100% 0 3.90 1.67 100% 0 3.89 1.67 100% 0 3.94 1.70 0% 926 NaN NA 0% 1159 NaN NA 0% 1237 NaN NA 0% 1182 NaN NA 0% 996 NaN NA
A12_B12 100% 0 3.73 2.08 100% 0 3.62 2.14 100% 0 3.83 2.02 100% 0 3.96 1.95 100% 0 4.12 1.82 100% 0 4.14 1.80 100% 0 4.21 1.76 100% 0 4.23 1.73 100% 0 4.19 1.77
A13_B13 100% 0 3.46 2.09 100% 0 3.39 2.10 100% 0 3.56 2.00 100% 0 3.75 1.96 100% 0 3.68 1.94 100% 0 3.81 1.89 100% 0 3.87 1.87 100% 0 3.94 1.81 100% 0 3.83 1.90
A14_B14 100% 0 2.27 1.96 100% 0 2.33 1.97 100% 0 2.65 1.98 100% 0 2.80 2.00 100% 0 2.82 1.96 100% 0 2.87 1.93 100% 0 3.01 1.96 100% 0 3.01 1.96 100% 0 3.04 2.01
A15_B15 100% 0 3.34 2.13 100% 0 3.24 2.15 100% 0 3.43 2.09 100% 0 3.40 2.17 100% 0 3.50 2.07 100% 0 3.49 2.06 100% 0 3.42 2.12 100% 0 3.71 1.98 100% 0 3.56 2.09
A16_B16 100% 0 4.24 1.79 100% 0 4.13 1.89 100% 0 4.25 1.79 100% 0 4.31 1.73 100% 0 4.42 1.61 100% 0 4.40 1.63 100% 0 4.38 1.65 100% 0 4.45 1.57 100% 0 4.58 1.39
A17_B17 100% 0 4.39 1.63 100% 0 4.26 1.78 100% 0 4.42 1.60 100% 0 4.39 1.64 100% 0 4.62 1.33 100% 0 4.58 1.39 100% 0 4.60 1.36 100% 0 4.62 1.33 100% 0 4.64 1.30
A18_B18 100% 0 3.23 2.39 100% 0 3.17 2.41 100% 0 3.41 2.33 100% 0 3.54 2.28 100% 0 3.70 2.20 100% 0 3.65 2.22 100% 0 3.84 2.11 100% 0 3.87 2.09 100% 0 3.95 2.04
A19_B19 100% 0 2.14 2.47 100% 0 2.08 2.47 100% 0 2.43 2.50 100% 0 2.76 2.49 100% 0 2.86 2.48 100% 0 3.05 2.44 100% 0 3.12 2.42 100% 0 3.22 2.39 100% 0 3.24 2.39
A20_B20 100% 0 1.70 2.37 100% 0 1.67 2.36 100% 0 1.86 2.42 100% 0 2.27 2.49 100% 0 2.20 2.48 100% 0 2.29 2.49 100% 0 2.34 2.50 100% 0 2.23 2.49 100% 0 2.43 2.50
A0_B7 0% 880 NaN NA 0% 1011 NaN NA 0% 777 NaN NA 0% 825 NaN NA 100% 0 1.88 2.42 100% 0 1.90 2.43 100% 0 1.98 2.45 100% 0 2.14 2.47 100% 0 2.03 2.46
A0_B10 0% 880 NaN NA 0% 1011 NaN NA 0% 777 NaN NA 0% 825 NaN NA 100% 0 1.99 1.84 100% 0 2.06 1.84 100% 0 2.33 1.90 100% 0 2.47 1.92 100% 0 2.37 1.93
A0_B11 0% 880 NaN NA 0% 1011 NaN NA 0% 777 NaN NA 0% 825 NaN NA 100% 0 3.72 1.85 100% 0 3.76 1.80 100% 0 3.84 1.78 100% 0 3.82 1.80 100% 0 3.76 1.85

2. Testing assumptions

Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.

Since the combined data on the two versions of the test have large portions of “missing” data (e.g. no responses to new questions from students who completed the old test), it is not possible to carry out the factor analysis as in the analyse-test script, since the factor analysis routine does not work with missing data.

Instead, in the next section we proceed directly to fitting the IRT model, and using the \(Q_3\) check for local independence. In the final section, we also run a factor analysis for the data from the new test only.

3. Fitting 2 parameter logistic MIRT model

The mirt implementation of the graded partial credit model (gpmc) requires that the partial marks are consecutive integers. We therefore need to work around this by adjusting our scores into that form (e.g. replacing scores of 0, 2.5, 5 with 1, 2, 3), while keeping track of the true scores attached to each mark level so that we can properly compute expected scores later on.

# save just the matrix of scores
item_scores <- test_scores %>% 
  select(matches("^A\\d"))

# Determine the mark levels for each item
mark_levels <- item_scores %>% 
  pivot_longer(everything(), names_to = "item", values_to = "score") %>% 
  distinct() %>% 
  # we don't want to give a level to NA
  filter(!is.na(score)) %>% 
  arrange(parse_number(item), score) %>% 
  group_by(item) %>%
  mutate(order = row_number()) %>% 
# Note that the convention used by mirt is for items that have only 2 levels (i.e. 0 marks or full marks),
# the columns are P.0 and P.1, while other items are indexed from 1, i.e. P.1, P.2, ...
# https://github.com/philchalmers/mirt/blob/accd2383b9a4d17a4cab269717ce98434900b62c/R/probtrace.R#L57
  mutate(level = case_when(
    max(order) == 2 ~ order - 1,
    TRUE ~ order * 1.0
  )) %>% 
  mutate(levelname = paste0(item, ".P.", level))

# Use the mark_levels table to replace scores with levels
# (first pivot the data to long form, make the replacement, then pivot back to wide again)
item_scores_levelled <- item_scores %>% 
  # temporarily add row identifiers
  mutate(row = row_number()) %>% 
  pivot_longer(cols = -row, names_to = "item", values_to = "score") %>% 
  left_join(mark_levels %>% select(item, score, level), by = c("item", "score")) %>% 
  select(-score) %>% 
  pivot_wider(names_from = "item", values_from = "level") %>% 
  select(-row)

Show model fitting output

irt_fit <- mirt(
  data = item_scores_levelled, # just the columns with question scores
  model = 1,          # number of factors to extract
  itemtype = "gpcm",  # generalised partial credit model
  SE = TRUE           # estimate standard errors
  )
## 
Iteration: 1, Log-Lik: -145717.510, Max-Change: 2.56485
Iteration: 2, Log-Lik: -130984.841, Max-Change: 1.05305
Iteration: 3, Log-Lik: -129044.431, Max-Change: 0.26154
Iteration: 4, Log-Lik: -128798.276, Max-Change: 0.15657
Iteration: 5, Log-Lik: -128734.230, Max-Change: 0.15905
Iteration: 6, Log-Lik: -128705.150, Max-Change: 0.14087
Iteration: 7, Log-Lik: -128683.347, Max-Change: 0.08365
Iteration: 8, Log-Lik: -128671.511, Max-Change: 0.05435
Iteration: 9, Log-Lik: -128662.797, Max-Change: 0.01875
Iteration: 10, Log-Lik: -128658.284, Max-Change: 0.02391
Iteration: 11, Log-Lik: -128655.946, Max-Change: 0.02953
Iteration: 12, Log-Lik: -128654.673, Max-Change: 0.02986
Iteration: 13, Log-Lik: -128653.305, Max-Change: 0.00863
Iteration: 14, Log-Lik: -128652.870, Max-Change: 0.00219
Iteration: 15, Log-Lik: -128652.770, Max-Change: 0.00167
Iteration: 16, Log-Lik: -128652.637, Max-Change: 0.00120
Iteration: 17, Log-Lik: -128652.615, Max-Change: 0.00012
Iteration: 18, Log-Lik: -128652.613, Max-Change: 0.00025
Iteration: 19, Log-Lik: -128652.612, Max-Change: 0.00016
Iteration: 20, Log-Lik: -128652.610, Max-Change: 0.00031
Iteration: 21, Log-Lik: -128652.610, Max-Change: 0.00013
Iteration: 22, Log-Lik: -128652.609, Max-Change: 0.00011
Iteration: 23, Log-Lik: -128652.608, Max-Change: 0.00021
Iteration: 24, Log-Lik: -128652.607, Max-Change: 0.00008
## 
## Calculating information matrix...

Local independence

We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).

irt_residuals  %>% as.matrix() %>% 
  corrplot::corrplot(type = "upper")

This shows that most item pairs are independent, with only one showing cause for concern:

irt_residuals %>%
  rownames_to_column(var = "item1") %>%
  as_tibble() %>% 
  pivot_longer(cols = starts_with("A"), names_to = "item2", values_to = "Q3_score") %>% 
  filter(abs(Q3_score) > 0.2) %>% 
  filter(parse_number(item1) < parse_number(item2)) %>%
  gt()
item1 item2 Q3_score
A16_B16 A17_B17 0.25521

This pair of questions on variations of the chain rule was also identified by the analysis of the first version of the test.

Given that this violation of the local independence assumption is very mild, we proceed using this model.

Model parameters

We then compute factor score estimates and augment the existing data frame with these estimates, to keep everything in one place. To do the estimation, we use the fscores() function from the mirt package which takes in a computed model object and computes factor score estimates according to the method specified. We will use the EAP method for factor score estimation, which is the “expected a-posteriori” method, the default.

test_scores <- test_scores %>%
  mutate(F1 = fscores(irt_fit, method = "EAP"))

We can also calculate the model coefficient estimates using a generic function coef() which is used to extract model coefficients from objects returned by modeling functions. We will set the IRTpars argument to TRUE, which means slope intercept parameters will be converted into traditional IRT parameters.

irt_coefs <- coef(irt_fit, IRTpars = TRUE)

The resulting object coefs is a list, with one element for each question, and an additional GroupPars element that we won’t be using. For each question, the object records several values:

  • a is discrimination
  • b is difficulty
  • endpoints of the 95% confidence intervals are also shown

To make this output a little more user friendly, we use the tidy_mirt_coefs function that we have provided in common-functions.R, to produce a single data frame with a row for each question.

tidy_mirt_coefs <- function(x){
  x %>%
    # melt the list element
    melt() %>%
    # convert to a tibble
    as_tibble() %>%
    # convert factors to characters
    mutate(across(where(is.factor), as.character)) %>%
    # only focus on rows where X2 is a, or starts with b (the parameters in the GPCM)
    filter(X2 == "a" | str_detect(X2, "^b")) %>%
    # in X1, relabel par (parameter) as est (estimate)
    mutate(X1 = if_else(X1 == "par", "est", X1)) %>%
    # turn into a wider data frame
    pivot_wider(names_from = X1, values_from = value) %>% 
    rename(par = X2)
}

# use head(., -1) to remove the last element, `GroupPars`, which does not correspond to a question
tidy_irt_coefs <- map_dfr(head(irt_coefs, -1), tidy_mirt_coefs, .id = "Question")

Here is a nicely formatted table of the result:

tidy_irt_coefs %>% 
  filter(par == "a") %>% 
  select(-par) %>% 
  rename_with(.fn = ~ paste0("a_", .x), .cols = -Question) %>% 
  left_join(
    tidy_irt_coefs %>% 
      filter(str_detect(par, "^b")),
    by = "Question"
  ) %>% 
  gt(groupname_col = "Question") %>%
  fmt_number(columns = contains("est|_"), decimals = 3) %>%
  data_color(
    columns = contains("a_"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = c("est", "CI_2.5", "CI_97.5"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  ) %>%
  tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
  tab_spanner(label = "Difficulty", columns = c("par", "est", "CI_2.5", "CI_97.5"))
Discrimination Difficulty
a_est a_CI_2.5 a_CI_97.5 par est CI_2.5 CI_97.5
A1_B1
0.6498042 0.6039469 0.6956614 b1 -0.71654270 -0.8802811 -0.55280428
0.6498042 0.6039469 0.6956614 b2 -3.61893119 -3.8882574 -3.34960494
A2_B0
0.6301155 0.5663522 0.6938787 b1 1.01263690 0.6296772 1.39559661
0.6301155 0.5663522 0.6938787 b2 -5.03660052 -5.5871736 -4.48602742
A3_B2
1.3962706 1.3137024 1.4788388 b1 -0.81419853 -0.8677811 -0.76061599
A4_B3
0.2564078 0.2424465 0.2703690 b1 17.55677290 13.3994024 21.71414340
0.2564078 0.2424465 0.2703690 b2 -18.53411144 -22.6679577 -14.40026514
0.2564078 0.2424465 0.2703690 b3 2.30865874 1.4760844 3.14123307
0.2564078 0.2424465 0.2703690 b4 3.71818628 2.2937500 5.14262258
0.2564078 0.2424465 0.2703690 b5 -10.43564199 -11.8119677 -9.05931629
0.2564078 0.2424465 0.2703690 b6 0.64973553 0.1011823 1.19828876
0.2564078 0.2424465 0.2703690 b7 -3.31025417 -3.8333985 -2.78710984
0.2564078 0.2424465 0.2703690 b8 4.43086127 3.7811680 5.08055453
0.2564078 0.2424465 0.2703690 b9 -1.93539486 -2.6062557 -1.26453399
0.2564078 0.2424465 0.2703690 b10 -3.05857448 -3.5832469 -2.53390204
0.2564078 0.2424465 0.2703690 b11 7.96868276 7.0562188 8.88114668
0.2564078 0.2424465 0.2703690 b12 -14.92629639 -16.0585829 -13.79400986
A5_B4
1.3628424 1.2775306 1.4481542 b1 -1.34560886 -1.4195075 -1.27171021
A6_B5
0.4897225 0.4582114 0.5212336 b1 0.51215829 0.4016310 0.62268555
0.4897225 0.4582114 0.5212336 b2 8.50824198 7.7403844 9.27609958
0.4897225 0.4582114 0.5212336 b3 -6.67190462 -7.4165123 -5.92729694
A7_B6
0.8893270 0.8383670 0.9402870 b1 1.33395065 1.1624063 1.50549501
0.8893270 0.8383670 0.9402870 b2 -3.06069337 -3.2718360 -2.84955076
A8_B0
0.8708402 0.7763904 0.9652901 b1 -0.73764157 -0.8383570 -0.63692616
A9_B8
1.8753874 1.7586786 1.9920962 b1 -1.39576942 -1.4602993 -1.33123955
A10_B9
0.8977396 0.8472517 0.9482275 b1 -0.69447234 -0.7746887 -0.61425600
0.8977396 0.8472517 0.9482275 b2 -1.05314133 -1.1438435 -0.96243917
A11_B0
0.5284065 0.4825679 0.5742452 b1 -0.05164365 -0.4559668 0.35267952
0.5284065 0.4825679 0.5742452 b2 -3.64927273 -4.0551738 -3.24337165
0.5284065 0.4825679 0.5742452 b3 5.01032412 4.2468686 5.77377959
0.5284065 0.4825679 0.5742452 b4 -7.63488521 -8.5220153 -6.74775508
A12_B12
0.6690175 0.6297765 0.7082585 b1 1.33006553 1.0966172 1.56351386
0.6690175 0.6297765 0.7082585 b2 -0.54793330 -0.7719983 -0.32386835
0.6690175 0.6297765 0.7082585 b3 -4.31202385 -4.6122088 -4.01183889
A13_B13
0.6687580 0.6313361 0.7061800 b1 -0.53674184 -0.6571816 -0.41630213
0.6687580 0.6313361 0.7061800 b2 2.96793195 2.6506158 3.28524810
0.6687580 0.6313361 0.7061800 b3 -5.41392753 -5.8108857 -5.01696940
A14_B14
0.6530867 0.6192677 0.6869057 b1 1.42504291 1.2260140 1.62407186
0.6530867 0.6192677 0.6869057 b2 -3.15568713 -3.3663622 -2.94501210
0.6530867 0.6192677 0.6869057 b3 7.34229727 6.6044972 8.08009735
0.6530867 0.6192677 0.6869057 b4 -3.32992312 -4.0132797 -2.64656652
0.6530867 0.6192677 0.6869057 b5 -3.60980610 -3.9094050 -3.31020717
A15_B15
0.2996588 0.2806140 0.3187035 b1 7.49204312 6.7623355 8.22175071
0.2996588 0.2806140 0.3187035 b2 -5.70096594 -6.3245982 -5.07733373
0.2996588 0.2806140 0.3187035 b3 3.24135698 2.7820672 3.70064674
0.2996588 0.2806140 0.3187035 b4 -8.93023517 -9.6180731 -8.24239727
A16_B16
2.0821843 1.9491758 2.2151928 b1 -1.43998301 -1.5033635 -1.37660250
A17_B17
2.4983673 2.3262411 2.6704934 b1 -1.55154531 -1.6147999 -1.48829074
A18_B18
1.3671530 1.2852785 1.4490275 b1 -0.90548778 -0.9627000 -0.84827558
A19_B19
1.8448311 1.7429463 1.9467160 b1 -0.17533493 -0.2107730 -0.13989687
A20_B20
2.0506549 1.9358421 2.1654677 b1 0.25977422 0.2258841 0.29366439
A0_B7
1.1256167 1.0340423 1.2171911 b1 0.58126267 0.5184027 0.64412266
A0_B10
0.4346299 0.4061118 0.4631480 b1 4.66604020 4.1482123 5.18386809
0.4346299 0.4061118 0.4631480 b2 -4.46992913 -4.9466160 -3.99324220
0.4346299 0.4061118 0.4631480 b3 2.15641782 1.8421915 2.47064414
0.4346299 0.4061118 0.4631480 b4 -0.16165124 -0.4860565 0.16275401
0.4346299 0.4061118 0.4631480 b5 0.99733832 0.6554328 1.33924388
0.4346299 0.4061118 0.4631480 b6 -0.27429274 -0.6151218 0.06653628
0.4346299 0.4061118 0.4631480 b7 1.80937930 1.4514936 2.16726504
0.4346299 0.4061118 0.4631480 b8 -2.37031429 -2.7532597 -1.98736888
A0_B11
0.4415561 0.4104003 0.4727118 b1 2.19821926 1.7304127 2.66602585
0.4415561 0.4104003 0.4727118 b2 -4.72103582 -5.1985394 -4.24353222
0.4415561 0.4104003 0.4727118 b3 8.08197275 7.0158913 9.14805423
0.4415561 0.4104003 0.4727118 b4 -3.02649470 -4.0273930 -2.02559636
0.4415561 0.4104003 0.4727118 b5 -7.79781171 -8.5282818 -7.06734158

These values are also saved to the file output/uoe_pre-vs-post_gpcm-results.csv.

tidy_irt_coefs %>% 
  write_csv("output/uoe_pre-vs-post_gpcm-results.csv")

Information curves

theta <- seq(-6, 6, by=0.05)

info_matrix <- testinfo(irt_fit, theta, individual = TRUE)
colnames(info_matrix) <- test_versions %>% pull(label)
item_info_data <- info_matrix %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "item", values_to = "info_y") %>% 
  left_join(test_versions %>% select(item = label, MATH_group), by = "item") %>% 
  mutate(item = fct_reorder(item, parse_number(item)))

Test information curve

item_info_data %>% 
  group_by(theta) %>% 
  summarise(info_y = sum(info_y)) %>% 
  ggplot(aes(x = theta, y = info_y)) +
  geom_line() +
  labs(y = "Information") +
  theme_minimal()

ggsave("output/uoe_pre-vs-post_info.pdf", width = 6, height = 4, units = "in")

This shows that the information given by the test is skewed toward the lower end of the ability scale - i.e. it can give more accurate estimates of students’ ability where their ability level is slightly below the mean.

Item information curves

Breaking this down by question helps to highlight those questions that are most/least informative:

item_info_data %>% 
  ggplot(aes(x = theta, y = info_y, colour = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Information") +
  theme_minimal()

We can also compute the sums of different subsets of the information curves – here, we will look at the questions based on their MATH group:

item_info_data %>% 
  group_by(theta) %>% 
  summarise(
    items_all = sum(info_y),
    items_A = sum(ifelse(MATH_group == "A", info_y, 0)),
    items_B = sum(ifelse(MATH_group == "B", info_y, 0)),
    items_C = sum(ifelse(MATH_group == "C", info_y, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>% 
  mutate(items = fct_relevel(items, "all", "A", "B", "C")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items)) +
  geom_line() +
  scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
  labs(x = "Ability", y = "Information") +
  theme_minimal()

ggsave("output/uoe_pre-vs-post_info-curves_A-vs-B.pdf", units = "cm", width = 14, height = 6)

This shows that the information in the MATH Group B questions is at a higher point on the ability scale than for the MATH Group A questions.

Since the number of items in each case is different, we consider instead the mean information per item:

item_info_data %>% 
  group_by(theta) %>% 
  summarise(
    items_all = sum(info_y) / n(),
    items_A = sum(ifelse(MATH_group == "A", info_y, 0)) / sum(ifelse(MATH_group == "A", 1, 0)),
    items_B = sum(ifelse(MATH_group == "B", info_y, 0)) / sum(ifelse(MATH_group == "B", 1, 0)),
    items_C = sum(ifelse(MATH_group == "C", info_y, 0)) / sum(ifelse(MATH_group == "C", 1, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>% 
  mutate(items = fct_relevel(items, "all", "A", "B", "C")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items)) +
  geom_line() +
  scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
  labs(x = "Ability", y = "Mean information per item") +
  theme_minimal()

ggsave("output/uoe_pre-vs-post_info-curves_A-vs-B-avg.pdf", units = "cm", width = 10, height = 6)

This shows that items of each MATH group are giving broadly similar levels of information on average, but at different points on the ability scale.

Total information

Using mirt’s areainfo function, we can find the total area under the information curves.

info_gpcm <- areainfo(irt_fit, c(-4,4))
info_gpcm %>% gt()
LowerBound UpperBound Info TotalInfo Proportion nitems
-4 4 42.02264 43.40405 0.9681734 23

This shows that the total information in all items is 43.4040491.

Information by item

tidy_info <- test_versions %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(irt_fit,
               c(-4, 4),
               which.items = .x) %>% pull(TotalInfo)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
pre post MATH_group description label outcome Information
NA B10 B 2x2 system possibilities A0_B10 added 3.4740490
A14 B14 B find minimum gradient of cubic A14_B14 unchanged 3.2649596
A4 B3 A completing the square A4_B3 unchanged 3.0715660
A17 B17 A chain rule A17_B17 unchanged 2.4983673
NA B11 C using max and min functions A0_B11 added 2.2052047
A11 NA A summing arithmetic progression A11_B0 removed 2.1106414
A16 B16 A trig chain rule A16_B16 unchanged 2.0821843
A20 B20 B product rule with given values A20_B20 unchanged 2.0506549
A12 B12 A find point with given slope A12_B12 unchanged 2.0066633
A13 B13 A equation of tangent A13_B13 unchanged 2.0050597
A9 B8 A simplify logs A9_B8 unchanged 1.8753872
A19 B19 B area between curve and x-axis (in 2 parts) A19_B19 unchanged 1.8448311
A10 B9 B identify graph of rational functions A10_B9 unchanged 1.7952237
A7 B6 B graphical vector sum A7_B6 unchanged 1.7786086
A6 B5 A trig wave function A6_B5 unchanged 1.4660620
A3 B2 B composition of functions A3_B2 unchanged 1.3962664
A18 B18 A definite integral A18_B18 unchanged 1.3671471
A5 B4 A trig double angle formula A5_B4 unchanged 1.3628319
A1 B1 A properties of fractions A1_B1 unchanged 1.2979102
A2 NA A find intersection of lines A2_B0 removed 1.2595188
A15 B15 A find and classify stationary points of cubic A15_B15 unchanged 1.1948396
NA B7 B angle between 3d vectors (in context) A0_B7 added 1.1255812
A8 NA A compute angle between 3d vectors A8_B0 removed 0.8704912

Restricting instead to the range \(-2\leq\theta\leq2\):

tidy_info <- test_versions %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(irt_fit,
               c(-2, 2),
               which.items = .x) %>% pull(Info)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
pre post MATH_group description label outcome Information
NA B10 B 2x2 system possibilities A0_B10 added 3.0175812
A14 B14 B find minimum gradient of cubic A14_B14 unchanged 2.9111803
A4 B3 A completing the square A4_B3 unchanged 2.1982949
A20 B20 B product rule with given values A20_B20 unchanged 1.9746900
A17 B17 A chain rule A17_B17 unchanged 1.8835782
A19 B19 B area between curve and x-axis (in 2 parts) A19_B19 unchanged 1.7505150
NA B11 C using max and min functions A0_B11 added 1.7145125
A13 B13 A equation of tangent A13_B13 unchanged 1.6270792
A12 B12 A find point with given slope A12_B12 unchanged 1.6091576
A16 B16 A trig chain rule A16_B16 unchanged 1.5859112
A7 B6 B graphical vector sum A7_B6 unchanged 1.5205341
A9 B8 A simplify logs A9_B8 unchanged 1.4153751
A10 B9 B identify graph of rational functions A10_B9 unchanged 1.3740964
A11 NA A summing arithmetic progression A11_B0 removed 1.3213868
A3 B2 B composition of functions A3_B2 unchanged 1.1454766
A18 B18 A definite integral A18_B18 unchanged 1.0917384
A5 B4 A trig double angle formula A5_B4 unchanged 0.9525037
A6 B5 A trig wave function A6_B5 unchanged 0.9523906
NA B7 B angle between 3d vectors (in context) A0_B7 added 0.8776552
A15 B15 A find and classify stationary points of cubic A15_B15 unchanged 0.8009079
A2 NA A find intersection of lines A2_B0 removed 0.6104580
A8 NA A compute angle between 3d vectors A8_B0 removed 0.5797480
A1 B1 A properties of fractions A1_B1 unchanged 0.5687239

Evaluating the changes

item_info_data %>% 
  mutate(item = as.character(item)) %>% 
  left_join(test_versions %>% mutate(item = as.character(label)), by = "item") %>% 
  group_by(theta) %>% 
  summarise(
    items_pre = sum(ifelse(outcome %in% c("unchanged", "removed"), info_y, 0)),
    items_post = sum(ifelse(outcome %in% c("unchanged", "added"), info_y, 0)),
    items_added = sum(ifelse(outcome %in% c("added"), info_y, 0)),
    items_removed = sum(ifelse(outcome %in% c("removed"), info_y, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>% 
  mutate(panel = ifelse(items %in% c("pre", "post"), "overall", "changes") %>% fct_rev()) %>% 
  mutate(items = fct_relevel(items, "removed", "added", "pre", "post")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items)) +
  geom_line() +
  scale_colour_brewer(palette = "Paired") +
  facet_wrap(vars(panel), scales = "free_y") +
  labs(x = "Ability", y = "Information") +
  theme_minimal()

ggsave("output/uoe_pre-vs-post_info-curves_A-vs-B.pdf", units = "cm", width = 14, height = 6)

The changes have increased the total information in the test:

info_old <- areainfo(irt_fit,
                     c(-4, 4),
                     which.items = test_versions %>% filter(outcome %in% c("unchanged", "removed")) %>% pull(item_num))
info_new <- areainfo(irt_fit,
                     c(-4, 4),
                     which.items = test_versions %>% filter(outcome %in% c("unchanged", "added")) %>% pull(item_num))

versions_info <- bind_rows("Version A" = info_old,
                           "Version B" = info_new,
                           .id = "version") %>% 
  select(-LowerBound, -UpperBound, -Info, -Proportion) %>% 
  mutate(mean = TotalInfo / nitems)

versions_info %>% gt() %>%
  cols_label(
    version = "Test version",
    TotalInfo = "Total information",
    nitems = "Number of items",
    mean = "Mean information per item"
  )
Test version Total information Number of items Mean information per item
Version A 36.59921 20 1.829961
Version B 39.16340 20 1.958170

Response curves

Since the gpcm model is more complicated, there is a characteristic curve for each possible score on the question:

trace_data <- probtrace(irt_fit, theta) %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "level", values_to = "y") %>% 
  left_join(mark_levels %>% select(item, level = levelname, score), by = "level") %>% 
  mutate(score = as.factor(score))

trace_data %>% 
  mutate(item = fct_reorder(item, parse_number(item))) %>% 
  ggplot(aes(x = theta, y = y, colour = score)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Probability of response") +
  theme_minimal()

To get a simplified picture for each question, we compute the expected score at each ability level:

expected_scores <- trace_data %>% 
  mutate(item = fct_reorder(item, parse_number(item))) %>% 
  group_by(item, theta) %>% 
  summarise(expected_score = sum(as.double(as.character(score)) * y), .groups = "drop")

expected_scores %>% 
  ggplot(aes(x = theta, y = expected_score, colour = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Expected score") +
  theme_minimal()

The resulting curves look quite similar to those from the 2PL, allowing for some similar interpretation. For instance, superimposing all the curves shows that there is a spread of difficulties (i.e. thetas where the expected score is 2.5/5) and that some questions are more discriminating than others (i.e. steeper slopes):

plt <- expected_scores %>% 
  ggplot(aes(x = theta, y = expected_score, colour = item, text = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  labs(y = "Expected score") +
  theme_minimal()

ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/uoe_pre-vs-post_iccs-superimposed.pdf", width = 20, height = 14, units = "cm")

Test response curve

total_expected_score <- expected_scores %>% 
  group_by(theta) %>% 
  summarise(expected_score = sum(expected_score))

total_expected_score %>% 
  ggplot(aes(x = theta, y = expected_score)) +
  geom_line() +
  # geom_point(data = total_expected_score %>% filter(theta == 0)) +
  # ggrepel::geom_label_repel(data = total_expected_score %>% filter(theta == 0), aes(label = round(expected_score, 1)), box.padding = 0.5) +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  labs(y = "Expected score") +
  theme_minimal()

4. Factor analysis for the new test only

Factor analysis setup

Here we redo the factor analysis, but using only the data from the new version of the test.

item_scores_B <- test_scores %>% 
  select(-F1) %>% 
  select(-contains("B0")) %>% 
  filter(test_version == "post") %>% 
  # select only the columns with question scores (names like Ax_Bx)
  select(matches("A\\d+_B\\d+"))

The parameters package provides functions that run various checks to see if the data is suitable for factor analysis, and if so, how many factors should be retained.

structure <- check_factorstructure(item_scores_B)
n <- n_factors(item_scores_B)

Is the data suitable for Factor Analysis?

  • KMO: The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.94).
    • Sphericity: Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(190) = 23456.86, p < .001).

Method Agreement Procedure:

The choice of 1 dimensions is supported by 8 (34.78%) methods out of 23 (t, p, Acceleration factor, R2, VSS complexity 1, Velicer’s MAP, TLI, RMSEA).

plot(n)
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

summary(n) %>% gt()
n_Factors n_Methods
1 8
2 5
3 1
4 3
5 1
13 1
17 1
18 3
#n %>% tibble() %>% gt()

The scree plot shows the eignvalues associated with each factor:

fa.parallel(item_scores_B, fa = "fa")

## Parallel analysis suggests that the number of factors =  6  and the number of components =  NA

Based on this, there is clear support for a 1-factor solution. We also consider the 2-factor solution.

1 Factor

Show factanal output

fitfact <- factanal(item_scores_B, factors = 1, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A9_B8  A10_B9 A12_B12 A13_B13 
##    0.89    0.76    0.68    0.80    0.86    0.72    0.75    0.75    0.72    0.69 
## A14_B14 A15_B15 A16_B16 A17_B17 A18_B18 A19_B19 A20_B20   A0_B7  A0_B10  A0_B11 
##    0.59    0.86    0.77    0.74    0.80    0.68    0.71    0.85    0.65    0.69 
## 
## Loadings:
##  [1] 0.57 0.53 0.50 0.50 0.52 0.56 0.64 0.51 0.57 0.54 0.59 0.56 0.33 0.49 0.45
## [16] 0.37 0.38 0.48 0.45 0.39
## 
##                Factor1
## SS loadings       5.06
## Proportion Var    0.25
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 2339.14 on 170 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)
ggplot(load, aes(x = fl1, y = 0)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = NULL,
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(test_versions %>% select(question = label, description, MATH_group), by = "question") %>% 
  arrange(-factor_loading) %>% 
  gt() %>%
  data_color(
    columns = contains("factor"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("MATH"),
    colors = MATH_colours
  )
question factor_loading description MATH_group
A14_B14 0.6369079 find minimum gradient of cubic B
A0_B10 0.5940682 2x2 system possibilities B
A4_B3 0.5678270 completing the square A
A19_B19 0.5661081 area between curve and x-axis (in 2 parts) B
A0_B11 0.5579319 using max and min functions C
A13_B13 0.5575991 equation of tangent A
A20_B20 0.5416987 product rule with given values B
A7_B6 0.5315657 graphical vector sum B
A12_B12 0.5244957 find point with given slope A
A17_B17 0.5079599 chain rule A
A9_B8 0.5034418 simplify logs A
A10_B9 0.5004090 identify graph of rational functions B
A3_B2 0.4857094 composition of functions B
A16_B16 0.4845865 trig chain rule A
A5_B4 0.4508873 trig double angle formula A
A18_B18 0.4493676 definite integral A
A0_B7 0.3866100 angle between 3d vectors (in context) B
A15_B15 0.3777250 find and classify stationary points of cubic A
A6_B5 0.3748862 trig wave function A
A1_B1 0.3318861 properties of fractions A

The questions that load most strongly on this factor are again dominated by the MATH Group B questions.

The new “in context” question, A0_B7, is disappointingly low on the factor loading (and on information, shown above). Perhaps the context is sufficiently routine for these students.

4 Factor

Here we also investigate the 4-factor solution, to see whether these factors are interpretable.

Show factanal output

fitfact2 <- factanal(item_scores_B, factors = 2, rotation = "varimax")
print(fitfact2, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 2, rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A9_B8  A10_B9 A12_B12 A13_B13 
##    0.89    0.75    0.68    0.80    0.85    0.71    0.74    0.75    0.72    0.69 
## A14_B14 A15_B15 A16_B16 A17_B17 A18_B18 A19_B19 A20_B20   A0_B7  A0_B10  A0_B11 
##    0.58    0.84    0.57    0.45    0.77    0.68    0.68    0.82    0.56    0.69 
## 
## Loadings:
##         Factor1 Factor2
## A4_B3   0.50           
## A14_B14 0.59           
## A20_B20 0.53           
## A0_B10  0.64           
## A16_B16         0.63   
## A17_B17         0.73   
## A1_B1                  
## A3_B2   0.47           
## A5_B4   0.36           
## A6_B5   0.36           
## A7_B6   0.50           
## A9_B8   0.35    0.37   
## A10_B9  0.40           
## A12_B12 0.37    0.37   
## A13_B13 0.44    0.34   
## A15_B15         0.33   
## A18_B18         0.41   
## A19_B19 0.49           
## A0_B7   0.41           
## A0_B11  0.47           
## 
##                Factor1 Factor2
## SS loadings       3.50    2.27
## Proportion Var    0.18    0.11
## Cumulative Var    0.18    0.29
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 934.49 on 151 degrees of freedom.
## The p-value is 3.79e-113
load2 <- tidy(fitfact2)
load2_plot <- load2 %>%
  rename(question = variable) %>% 
  left_join(test_versions %>% rename(question = label), by = "question") %>%
  ggplot(aes(x = fl1, y = fl2, colour = MATH_group, shape = MATH_group)) +
  geom_point() +
  geom_text_repel(aes(label = question), show.legend = FALSE, alpha = 0.6) +
  labs(
    x = "Factor 1 (of 2)",
    y = "Factor 2 (of 2)"
  ) +
  scale_colour_manual("MATH group", values = MATH_colours) +
  scale_shape_manual(name = "MATH group", values = c(19, 17, 18)) +
  theme_minimal()


load2_plot +
  labs(
    title = "Standardised Loadings",
    subtitle = "Showing the 2-factor model"
  )

ggsave("output/uoe_pre-vs-post_2factor.pdf", units = "cm", width = 12, height = 8, dpi = 300,
       plot = load2_plot)
main_factors <- load2 %>% 
#  mutate(factorNone = 0.4) %>%  # add this to set the main factor to "None" where all loadings are below 0.4
  pivot_longer(names_to = "factor",
               cols = contains("fl")) %>% 
  mutate(value_abs = abs(value)) %>% 
  group_by(variable) %>% 
  top_n(1, value_abs) %>% 
  ungroup() %>% 
  transmute(main_factor = factor, variable)

load2 %>% 
  select(-uniqueness) %>% 
  # add the info about which is the main factor
  left_join(main_factors, by = "variable") %>%
  left_join(test_versions %>% select(variable = label, description, MATH_group), by = "variable") %>% 
  arrange(main_factor) %>% 
  select(main_factor, everything()) %>% 
  # arrange adjectives by descending loading on main factor
  rowwise() %>% 
  mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>% 
  group_by(main_factor) %>% 
  arrange(-max_loading, .by_group = TRUE) %>% 
  select(-max_loading) %>% 
  # sort out the presentation
  rename("Main Factor" = main_factor, # the _ throws a latex error
         "Question" = variable) %>%
  mutate_at(
    vars(starts_with("fl")),
    ~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
  ) %>% 
  kable(booktabs = T, escape = F, longtable = T) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>%
  kableExtra::kable_styling(latex_options = c("repeat_header"))
Main Factor Question fl1 fl2 description MATH_group
fl1 A0_B10 0.644 0.143 2x2 system possibilities B
A14_B14 0.586 0.277 find minimum gradient of cubic B
A20_B20 0.534 0.195 product rule with given values B
A4_B3 0.502 0.27 completing the square A
A7_B6 0.496 0.218 graphical vector sum B
A19_B19 0.491 0.284 area between curve and x-axis (in 2 parts) B
A0_B11 0.475 0.294 using max and min functions C
A3_B2 0.469 0.181 composition of functions B
A13_B13 0.436 0.339 equation of tangent A
A0_B7 0.406 0.1 angle between 3d vectors (in context) B
A10_B9 0.403 0.293 identify graph of rational functions B
A5_B4 0.362 0.262 trig double angle formula A
A6_B5 0.359 0.141 trig wave function A
A1_B1 0.28 0.176 properties of fractions A
fl2 A17_B17 0.134 0.728 chain rule A
A16_B16 0.162 0.633 trig chain rule A
A18_B18 0.259 0.409 definite integral A
A12_B12 0.372 0.374 find point with given slope A
A9_B8 0.347 0.374 simplify logs A
A15_B15 0.228 0.328 find and classify stationary points of cubic A

TODO interpretation

About this report

This report supports the analysis in the following paper:

[citation needed]

Packages

In this analysis we used the following packages. You can learn more about each one by clicking on the links below.

  • mirt: For IRT analysis
  • psych: For factor analysis
  • tidyverse: For data wrangling and visualisation
  • reshape: For reshaping nested lists
  • vroom: For reading in many files at once
  • broom: For tidying model output
  • fs: For file system operations
  • gt: For formatting tables
  • knitr: For markdown tables
  • ggrepel: For labelling points without overlap
  • skimr: For data frame level summary
  • ggridges: For ridge plots